Computer Systems and Computer-Implemented Methods of Use Thereof Configured to Recognize User Activity During User Interaction with Electronic Computing Devices

ABSTRACT

A computer-implemented method and system that entails a continuous tracking of a plurality of representations over a predetermined time duration. The method and system also entails a continuous application of at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors and a continuous continuously input of the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN). The ATNN classifies at least one activity of the at least one user over the predetermined time duration and outputs a measure of the at least one user&#39;s engagement with the classified activity.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No. 62/701,106, entitled “COMPUTER SYSTEMS AND COMPUTER-IMPLEMENTED METHODS CONFIGURED TO TRACK NUMEROUS USER-RELATED PARAMETERS DURING USERS' INTERACTION WITH ELECTRONIC COMPUTING DEVICES”, filed on Jul. 20, 2018, incorporated herein in its entirety.

FIELD OF TECHNOLOGY

The present disclosure generally relates to improved computer-based systems improved computing devices configured for tracking eye-related parameters during user interaction with electronic computing devices.

SUMMARY OF DESCRIBED SUBJECT MATTER

In some embodiments, the present disclosure provides an exemplary technically improved computer-based method that includes continuously obtaining, by at least one processor, a visual input comprising a plurality of representations of at least one eye of at least one user to continuously track the plurality of representations over a predetermined time duration; wherein the visual input comprises a series of video frames, a series of images, or both; continuously applying, by the at least one processor, at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously inputting, by the at least one processor, the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the classified activity.

Some embodiments of the present disclosure relate to a system including: a camera component, wherein the camera component is configured to acquire a visual input, wherein the visual input includes a real-time representation of at least one eye of at least one user and wherein the visual input comprises at least one video frame, at least one image, or both; at least one processor; a non-transitory computer memory, storing a computer program that, when executed by the at least one processor, causes the at least one processor to: continuously apply at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously input the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to determine an attentiveness level of the at least one user over the predetermined time duration to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the at least one classified activity.

In some embodiments, the visual input includes a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.

In some embodiments, the at least one processor continuously applies to the visual input, at least one facial feature algorithm, wherein the at least one facial feature algorithm is chosen from at least one of: at least one face detection algorithm, at least one face tracking algorithm, at least one head pose estimation algorithm, at least one emotion recognition algorithm, or combinations thereof.

In some embodiments, application of the at least one facial feature algorithm transforms the representation of the at least one additional facial feature of the at least one user into at least one additional facial feature vector associated with the at least one additional facial feature, wherein the at least one facial feature vector is chosen from: at least one face angle vector, at least one facial coordinate vector, or a combination thereof.

In some embodiments, the processor continually obtains a time series of additional facial feature vectors.

In some embodiments, the plurality of representations includes at least one eye movement of at least one user.

In some embodiments, the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.

In some embodiments, the predetermined time duration ranges from 1 to 300 minutes.

In some embodiments, the at least one eye gaze vector includes at least two reference points, the at least two reference points including: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.

In some embodiments, the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors includes at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.

In some embodiments the at least one processor averages the at least one first eye gaze vector and the at least one second eye gaze vector.

In some embodiments, the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the present disclosure can be further explained with reference to the attached drawings, wherein like structures are referred to by like numerals throughout the several views. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the present disclosure. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ one or more illustrative embodiments.

FIG. 1 illustrates an exemplary environment in accordance with at least some embodiments of the present disclosure.

FIG. 2 illustrates an exemplary disclosed system for tracking users' activity

FIG. 3 illustrates exemplary software configured to track one or more of user activities and/or user-related parameters in accordance with at least some embodiments of the present invention.

FIGS. 4A-4D illustrate examples of eye-gaze track classification patterns used for training an exemplary neural network.

FIG. 5 illustrates an overall architecture of an exemplary disclosed convolutional neural network configured/trained for eye-gaze movement classification tasks. The inputs (eye-gaze tracks) of an exemplary disclosed convolutional neural network and the outputs (activity classes) are shown in FIGS. 4A-4D.

FIGS. 6A-6B illustrate details of the exemplary disclosed convolutional neural network (shown in FIG. 5) configured for eye gaze motion classification.

DETAILED DESCRIPTION

Embodiments of the present disclosure, briefly summarized above and discussed in greater detail below, can be understood by reference to the illustrative embodiments of the disclosure depicted in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this disclosure and are therefore not to be considered limiting of its scope, for the disclosure may admit to other equally effective embodiments.

To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. The figures are not drawn to scale and may be simplified for clarity. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.

Among those benefits and improvements that have been disclosed, other objects and advantages of this disclosure can become apparent from the following description taken in conjunction with the accompanying figures. Detailed embodiments of the present disclosure are disclosed herein; however, it is to be understood that the disclosed embodiments are merely illustrative of the disclosure that may be embodied in various forms. In addition, each of the examples given in connection with the various embodiments of the present disclosure is intended to be illustrative, and not restrictive.

Throughout the specification, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise. The phrases “in one embodiment” and “in some embodiments” as used herein do not necessarily refer to the same embodiment(s), though it may. Furthermore, the phrases “in another embodiment” and “in some other embodiments” as used herein do not necessarily refer to a different embodiment, although it may. Thus, as described below, various embodiments of the disclosure may be readily combined, without departing from the scope or spirit of the disclosure. Further, when a particular feature, structure, or characteristic is described in connection with an implementation, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other implementations whether or not explicitly described herein.

The term “based on” is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise. In addition, throughout the specification, the meaning of “a,” “an,” and “the” include plural references. The meaning of “in” includes “in” and “on.”

It is understood that at least one aspect/functionality of various embodiments described herein can be performed in real-time and/or dynamically. As used herein, the term “real-time” is directed to an event/action that can occur instantaneously or almost instantaneously in time when another event/action has occurred. For example, the “real-time processing,” “real-time computation,” and “real-time execution” all pertain to the performance of a computation during the actual time that the related physical process (e.g., a user interacting with an application on a mobile device) occurs, in order that results of the computation can be used in guiding the physical process.

As used herein, the term “dynamically” means that events and/or actions can be triggered and/or occur without any human intervention. In some embodiments, events and/or actions in accordance with the present disclosure can be in real-time and/or based on a predetermined periodicity of at least one of: nanosecond, several nanoseconds, millisecond, several milliseconds, second, several seconds, minute, several minutes, hourly, several hours, daily, several days, weekly, monthly, etc.

As used herein, the term “runtime” corresponds to any behavior that is dynamically determined during an execution of a software application or at least a portion of software application.

In some embodiments, the disclosed specially programmed computing systems with associated devices are configured to operate in the distributed network environment, communicating over a suitable data communication network (e.g., the Internet, etc.) and utilizing at least one suitable data communication protocol (e.g., IPX/SPX, X.25, AX.25, AppleTalk™, TCP/IP (e.g., HTTP), etc.). Of note, the embodiments described herein may, of course, be implemented using any appropriate hardware and/or computing software languages. In this regard, those of ordinary skill in the art are well versed in the type of computer hardware that may be used, the type of computer programming techniques that may be used (e.g., object oriented programming), and the type of computer programming languages that may be used (e.g., C++, Objective-C, Swift, Java, Javascript). The aforementioned examples are, of course, illustrative and not restrictive.

As used herein, the terms “image(s)” and “image data” are used interchangeably to identify data representative of visual content which includes, but not limited to, images encoded in various computer formats (e.g., “.jpg”, “.bmp,” etc.), streaming video based on various protocols (e.g., Real-time Streaming Protocol (RTSP), Real-time Transport Protocol (RTP), Real-time Transport Control Protocol (RTCP), etc.), recorded/generated non-streaming video of various formats (e.g., “.mov,” “.mpg,” “.wmv,” “.avi,” “.flv,” ect.), and real-time visual imagery acquired through a camera application on a mobile device.

The material disclosed herein may be implemented in software or firmware or a combination of them or as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), and others.

In another form, a non-transitory article, such as a non-transitory computer readable medium, may be used with any of the examples mentioned above or other examples except that it does not include a transitory signal per se. It does include those elements other than a signal per se that may hold data temporarily in a “transitory” fashion such as RAM and so forth.

As used herein, the terms “computer engine” and “engine” identify at least one software component and/or a combination of at least one software component and at least one hardware component which are designed/programmed/configured to manage/control other software and/or hardware components (such as the libraries, software development kits (SDKs), objects, etc.).

Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. In some embodiments, the one or more processors may be implemented as a Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors; x86 instruction set compatible processors, multi-core, or any other microprocessor or central processing unit (CPU). In various implementations, the one or more processors may be dual-core processor(s), dual-core mobile processor(s), and so forth.

Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.

One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that make the logic or processor.

As used herein, the term “user” shall have a meaning of at least one user.

As used herein, the terms “face” and “head” are used interchangeably and both refer to any portion of a user's body situated above the user's shoulders. The terms “face” and “head” are meant to encompass any accessories worn by the user in the portion of the user's body above the shoulders including but not limited to, a hat, glasses, jewelry and the like.

The present disclosure, among other things, provides exemplary technical solutions to the technical problem of measuring and tracking a user's engagement with an electronic computing device.

In some embodiments, electronic computing device may be, without limitation, any electronic computing device at least includes and/or operationally associates with at least one another electronic computer device that includes at least one processor, a digital camera, and an disclosed software. For example, an exemplary electronic computing device may be at least one selected from the group of desktop, laptop, mobile device (e.g., tablet, smartphone, etc.), Internet-of-Things (IoT) device (e.g., smart thermostat), and the like. In some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to track one or more users' interactions with at least one exemplary electronic computing device as one or more users interact with the at least one exemplary electronic computing device and/or another electronic device (e.g., another electronic computing device).

In some embodiments, since the at least one exemplary electronic computing device may include at least one camera that acquires visual input related to the one or more users' activities, the exemplary disclosed software with the exemplary disclosed computer system are configured to detect and recognize, for example without limitation, at least one or more of the following: face pose, head pose, anthropometrics, facial expression(s), emotion(s), eye(s) and eye-gaze vector(s). In some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to estimate a type of activity each user is engaged in (e.g., reading text, watching video, surfing the Internet, etc.)

In some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to process input visual input (e.g., a set of portrait images) to perform at least one or more of the following:

-   -   i) real-time tracking a user's level of attention to determine         the quality of “commitment” (e.g., interest in) to a particular         electronic content and/or a particular electronic device (e.g.,         the computer associated with camera from which the visual input         has been obtained, another electronic device),     -   ii) discriminating between several types of a user's activities         (e.g., reading, watching video, surfing the Internet, writing         text, programming, etc.),         For example, for the reading activity, the exemplary disclosed         software with the exemplary disclosed computer system are         configured to estimate how often user(s) lose(s) his/her/their         attention by tracking the respective eye-gaze.         In some embodiments, the exemplary disclosed software with the         exemplary disclosed computer system are configured to perform         the face detection and tracking in accordance with, without         limitation, one or more techniques, methodologies, and/or         systems detailed in U.S. Pat. No. 10,049,260, the specific         disclosure of which is incorporated herein by reference in its         entirety for such purpose. In some embodiments, the exemplary         disclosed software with the exemplary disclosed computer system         are configured to perform the eye-gaze movement tracking (EGMT)         in accordance with, without limitation, one or more techniques,         methodologies, and/or systems detailed herein, each of such         specific disclosures is incorporated herein by reference in its         entirety for such purpose. For example, as detailed herein, in         at least some embodiments, the exemplary disclosed software with         the exemplary disclosed computer system are configured to         calculate at least one eye-gaze vector using at least the two         reference points of the eye pupil and eye center. In some         embodiments, two eye-gaze vectors may be calculated, each vector         for each user's eye. In some embodiments, two eye-gaze vectors         may be averaged. In some embodiments, one or two eye-gaze         vectors may be used as inputs for the disclosed activity         recognition system. For example, the time dependence of eye-gaze         vectors may be recorded and the resulting tracks may be used as         an input for the disclosed activity tracking neural network         (ATNN). The exemplary tracks for some classes of user activities         are shown in FIG. 4. In some embodiments, the exemplary eye-gaze         vector may be defined in a coordinate system associated with         computing device camera. For example, in some embodiments, the         eye-gaze vector may be (x,y,z)=(0, 0, 1) for the user who looks         directly to the computing device camera.

In some embodiments, as detailed herein, the exemplary disclosed software with the exemplary disclosed computer system are configured to be applied, without limitation, for one or more of the following uses: working environment and/or information safety, advisory software for people who spend time using computers and electronic devices, parental control systems and other similar suitable computer-related activities and uses.

FIG. 1 illustrates an exemplary environment 100 in accordance with at least some embodiments of the present invention. As shown in FIG. 1, environment 100 may include a user 101, a computer or mobile device 103, a camera 104 and a server 105. Other devices may also be included. The computer 103 may include any appropriate type of computers, such as desktops, laptops, mobile devices, such as, but not limited to mobile phones, smartphones and tablets, or any other similarly suitable devices. The exemplary camera 104 may be a built-in camera, or an external camera or any other suitable camera. Further, the server 105 may include any appropriate type of server computer or a plurality of server computers. The user 101 may interact 102 with the computer 103 and the camera 104. The camera 104 continuously tracks the user activity in accordance with one ore principles of the present invention as detailed herein. The user 101 may be a single user or a plurality of users. The computer 103 and the server 105 may be implemented on any appropriate computing circuitry platform as detailed herein.

FIG. 2 illustrates an exemplary disclosed system for disclosed tracking user activity 200. As it shown in FIG. 2, in some embodiments, as detailed herein, the exemplary disclosed software with the exemplary disclosed computer system are configured to track users' activity in at least the following stages. During the stage 201, an exemplary specialized processor, executing the exemplary disclosed software, is programmed to receive, in real-time, the visual input (e.g., a series of images) taken by a camera 104 and provide them sent to one or more machine learning algorithms 202, which dynamically process, in real-time, the visual input (e.g., a series of images) to detect and track the user's face, automatically segment parts of the user's face, and, optionally, subtract background. In some embodiments, as detailed herein, the trained data (e.g., binary model files for geometrical head, face, eyes, etc.) for the one or more machine learning and deep learning algorithms 202 (e.g., the EGMT algorithm) may be prepared separately 203 on one or more servers 105. In some embodiments, the results of the one or more disclosed algorithms 202, the one or more disclosed algorithms 204 analyze types of user's activity (e.g., reading, watching video, surfing the net, typing, etc).

FIG. 3 illustrates the exemplary disclosed software with the exemplary disclosed computer system that are configured to track one or more of user activities and/or user-related parameters in accordance with at least some embodiments of the present invention. For example, the exemplary disclosed software with the exemplary disclosed computer system that are configured to continuously process the visual input (e.g., video frames) utilizing one or more face recognitions algorithms 301 (e.g., one or more techniques, methodologies, and/or systems detailed in U.S. Pat. No. 10,049,260). For example, one or more face recognitions algorithms 301 may include combined regressors (random forests+linear regressions) that takes local binary features and fit human face with a three-dimensional face model, that may include one or more of the following meta-parameters: camera position (e.g., rotations, translates, etc.), facial anthropometric variables, facial expression variables, light vector, etc. For example, one or more face recognitions algorithms 301 may be executed in face detection and/or face tracking modes. For example, the exemplary disclosed software with the exemplary disclosed computer system that are configured to use the output from applying the one or more face recognitions algorithms 301 generate, in real-time, a complete three-dimensional face model. For example, the disclosed eye-gaze detection module 302 may include one or more algorithms aimed to estimate both eye-gaze vector and eye-gaze point in the screen coordinates (e.g., one or more techniques, methodologies, and/or systems detailed herein). For example, the exemplary disclosed software with the exemplary disclosed computer system that are configured to detect eye-gaze based, at least in part, on the results obtained by the face recognition 301, as detailed further, for example without limitation, with respect to FIG. 6. For example, the exemplary disclosed eye-gaze motion analysis may include at least one machine-learning algorithm 303 (e.g., the disclosed EGMT algorithm) that classify eye-gaze tracks in the screen coordinates by several types (e.g. reading, typing, watching video, etc.) 304, as detailed further, for example without limitation, with respect to FIG. 6. In some embodiments, keyboard input is analyzed to determine whether a user is reading or typing.

For example, regarding the estimation of the reading speed, in at least some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to determine how many lines per a time period (e.g., minute, etc.) a person (e.g., child) is reading at by, for example without limitation, using data to determine amplitude(s) per time period.

For example, regarding the determining the focus levels, in at least some embodiments, the exemplary disclosed software with the exemplary disclosed computer system are configured to analyze “stationary” patterns of the eye-gaze curves.

FIGS. 4A-4D illustrate examples of eye-gaze track classification patterns used for training the exemplary disclosed convolutional neural network.

FIG. 5 illustrates an overall architecture of an exemplary disclosed convolutional neural network configured/trained for eye-gaze movement classification tasks. The inputs (eye-gaze tracks) of the exemplary disclosed convolutional neural network and the outputs (activity classes) are shown in FIGS. 4A-4D.

FIGS. 6A-6B illustrate details (layers) of the exemplary disclosed convolutional neural network (shown in FIG. 5) configured for eye gaze motion classification problem.

While a number of embodiments of the present disclosure have been described, it is understood that these embodiments are illustrative only, and not restrictive, and that many modifications may become apparent to those of ordinary skill in the art, including that various embodiments of the disclosed methodologies, the disclosed systems, and the disclosed devices described herein can be utilized in any combination with each other. Further still, the various steps may be carried out in any desired order (and any desired steps may be added and/or any desired steps may be eliminated).

At least some aspects of the present disclosure will now be described with reference to the following numbered clauses hereinafter designated as [C1, C2, C3, C4 . . . ]

C1: A computer-implemented method, comprising: continuously obtaining, by at least one processor, a visual input comprising a plurality of representations of at least one eye of at least one user to continuously track the plurality of representations over a predetermined time duration; wherein the visual input comprises a series of video frames, a series of images, or both; continuously applying, by the at least one processor, at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously inputting, by the at least one processor, the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the classified activity.

C2: The method of C1, wherein the visual input further comprises a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.

C3: The method of C2, further comprising, by the at least one processor, continuously applying to the visual input, at least one facial feature algorithm, wherein the at least one facial feature algorithm is chosen from at least one of: at least one face detection algorithm, at least one face tracking algorithm, at least one head pose estimation algorithm, at least one emotion recognition algorithm, or combinations thereof.

C4: The method of C3, wherein application of the at least one facial feature algorithm transforms the representation of the at least one additional facial feature of the at least one user into at least one additional facial feature vector associated with the at least one additional facial feature, wherein the at least one facial feature vector is chosen from: at least one face angle vector, at least one facial coordinate vector, or a combination thereof.

C5: The method of C4, further comprising, with the at least one processor, continuously obtaining a time series of additional facial feature vectors.

C6: The method of C5, wherein the plurality of representations comprises at least one eye movement of at least one user.

C7: The method of C1, wherein the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.

C8: The method of C1, wherein the predetermined time duration ranges from 1 to 300 minutes.

C9: The method of C1, wherein the at least one eye gaze vector comprises at least two reference points, the at least two reference points comprising: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.

C10: The method of C1, wherein the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors comprises at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.

C11. The method of C10, further comprising a step of, by the at least one processor, averaging the at least one first eye gaze vector and the at least one second eye gaze vector.

C12: The method of C1, wherein the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof.

C13: A system comprising: a camera component, wherein the camera component is configured to acquire a visual input, wherein the visual input comprises a real-time representation of at least one eye of at least one user and wherein the visual input comprises at least one video frame, at least one image, or both; at least one processor; a non-transitory computer memory, storing a computer program that, when executed by the at least one processor, causes the at least one processor to: continuously apply at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously input the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to determine an attentiveness level of the at least one user over the predetermined time duration to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the at least one classified activity.

C14: The system of C13, wherein the visual input further comprises a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.

C15: The system of C13, wherein the at least one processor continually applies to the visual input, at least one facial feature algorithm, wherein the at least one facial feature algorithm is chosen from at least one of: at least one face detection algorithm, at least one face tracking algorithm, at least one head pose estimation algorithm, at least one emotion recognition algorithm, or combinations thereof.

C16: The system of C15, wherein application of the at least one facial feature algorithm transforms the representation of the at least one additional facial feature of the at least one user into at least one additional facial feature vector associated with the at least one additional facial feature, wherein the at least one facial feature vector is chosen from: at least one face angle vector, at least one facial coordinate vector, or a combination thereof.

C17: The system of C16, wherein the at least one processor continually obtains a time series of additional facial feature vectors.

C18: The system of C13, wherein the plurality of representations comprises at least one eye movement of at least one user.

C19: The system of C13, wherein the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.

C20: The system of C13, wherein the predetermined time duration ranges from 1 to 300 minutes.

C21: The system of C13, wherein the at least one eye gaze vector comprises at least two reference points, the at least two reference points comprising: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.

C22: The system of C13, wherein the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors comprises at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.

C23: The system of C13, wherein the at least one processor averages the at least one first eye gaze vector and the at least one second eye gaze vector.

C24: the system of C13, wherein the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof.

Publications cited throughout this document are hereby incorporated by reference in their entirety.

While one or more embodiments of the present disclosure have been described, it is understood that these embodiments are illustrative only, and not restrictive, and that many modifications may become apparent to those of ordinary skill in the art, including that various embodiments of the disclosed methodologies, the disclosed systems/platforms, and the disclosed devices described herein can be utilized in any combination with each other. Further still, the various steps may be carried out in any desired order (and any desired steps may be added and/or any desired steps may be eliminated). 

1. A computer-implemented method, comprising: continuously obtaining, by at least one processor, a visual input comprising a plurality of representations of at least one eye of at least one user to continuously track the plurality of representations over a predetermined time duration; wherein the visual input comprises a series of video frames, a series of images, or both; continuously applying, by the at least one processor, at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously inputting, by the at least one processor, the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the classified activity.
 2. The method of claim 1, wherein the visual input further comprises a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.
 3. The method of claim 2, further comprising, by the at least one processor, continuously applying to the visual input, at least one facial feature algorithm, wherein the at least one facial feature algorithm is chosen from at least one of: at least one face detection algorithm, at least one face tracking algorithm, at least one head pose estimation algorithm, at least one emotion recognition algorithm, or combinations thereof.
 4. The method of claim 3, wherein application of the at least one facial feature algorithm transforms the representation of the at least one additional facial feature of the at least one user into at least one additional facial feature vector associated with the at least one additional facial feature, wherein the at least one facial feature vector is chosen from: at least one face angle vector, at least one facial coordinate vector, or a combination thereof.
 5. The method of claim 4, further comprising, with the at least one processor, continuously obtaining a time series of additional facial feature vectors.
 6. The method of claim 1, wherein the plurality of representations comprises at least one eye movement of at least one user.
 7. The method of claim 1, wherein the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.
 8. The method of claim 1, wherein the predetermined time duration ranges from 1 to 300 minutes.
 9. The method of claim 1, wherein the at least one eye gaze vector comprises at least two reference points, the at least two reference points comprising: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.
 10. The method of claim 1, wherein the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors comprises at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.
 11. The method of claim 10, further comprising a step of, by the at least one processor, averaging the at least one first eye gaze vector and the at least one second eye gaze vector.
 12. The method of claim 1, wherein the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof.
 13. A system comprising: a camera component, wherein the camera component is configured to acquire a visual input, wherein the visual input comprises a real-time representation of at least one eye of at least one user and wherein the visual input comprises at least one video frame, at least one image, or both; at least one processor; a non-transitory computer memory, storing a computer program that, when executed by the at least one processor, causes the at least one processor to: continuously apply at least one eye-gaze movement tracking (EGMT) algorithm to the visual input to form a time series of eye-gaze vectors; continuously input the time series of eye-gaze vectors into an Activity Tracking Neural Network (ATNN) to determine an attentiveness level of the at least one user over the predetermined time duration to: classify at least one activity of the at least one user over the predetermined time duration; and output a measure of the at least one user's engagement with the at least one classified activity.
 14. The system of claim 13, wherein the visual input further comprises a plurality of representations of at least one additional facial feature of the at least one user, wherein the at least one additional facial feature is chosen from at least one of: eye gaze, head pose, a distance between a user's face and at least one screen, head posture, at least one detected emotion, or combinations thereof.
 15. The system of claim 13, wherein the plurality of representations comprises at least one eye movement of at least one user.
 16. The system of claim 13, wherein the ATNN is trained using a plurality of representations of a plurality of users, wherein each representation depicts each user engaged in at least one activity.
 17. The system of claim 13, wherein the at least one eye gaze vector comprises at least two reference points, the at least two reference points comprising: a first reference point corresponding to an eye pupil; and a second reference point corresponding to an eye center.
 18. The system of claim 13, wherein the at least one eye gaze vector is a plurality of eye gaze vectors, wherein the plurality of eye gaze vectors comprises at least one first eye gaze vector corresponding to a first eye and at least one second eye gaze vector corresponding to a second eye.
 19. The system of claim 13, wherein the at least one processor averages the at least one first eye gaze vector and the at least one second eye gaze vector.
 20. The system of claim 13, wherein the at least one activity is chosen from: reading, watching video, surfing the internet, writing text, programming, or combinations thereof. 